- What is a self-join?
A self-join joins a table with itself using table aliases.
Example:
SELECT e1.name AS Employee, e2.name AS Manager
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.emp_id;
- Explain the difference between INNER JOIN and OUTER JOIN.
INNER JOIN: Returns matching records only.
OUTER JOIN: Returns matching and non-matching records.
Example:
SELECT * FROM emp INNER JOIN dept ON emp.dept_id = dept.id;
SELECT * FROM emp LEFT JOIN dept ON emp.dept_id = dept.id;
- What are indexes and their disadvantages?
Indexes speed up query performance.
Disadvantages:
- Extra storage space
- Slower INSERT/UPDATE/DELETE operations
- Can fragment over time
- What is referential integrity?
It ensures that foreign key values always match a valid primary key in another table.
Example:
ALTER TABLE orders ADD CONSTRAINT fk_cust FOREIGN KEY(cust_id)
REFERENCES customers(cust_id);
- Explain the difference between HAVING and WHERE clauses.
WHERE: Filters rows before grouping
HAVING: Filters groups after aggregation
Example:
SELECT dept_id, COUNT(*) FROM emp
WHERE salary > 5000
GROUP BY dept_id
HAVING COUNT(*) > 3;
- What are isolation levels in transactions?
- Read Uncommitted : Dirty reads possible
- Read Committed : Prevents dirty reads
- Repeatable Read : Prevents dirty and non-repeatable reads
- Serializable : Full isolation, slowest
- What are phantom reads?
- Occurs when new rows are added or deleted by another transaction between successive reads.
- Prevented by: Serializable isolation level.
- What is a cursor?
A cursor allows row-by-row processing of query results.
Example:
DECLARE cur CURSOR FOR SELECT name FROM employees;
OPEN cur;
FETCH NEXT FROM cur;
- What is a surrogate key?
A system-generated unique identifier (e.g., auto-increment ID) used instead of a natural key.
Advantage: Stability and simplicity
- What is a materialized view?
A materialized view stores the query result physically, unlike a normal view. It can be refreshed periodically to improve performance.
- Explain two-phase commit (2PC) protocol.
- Used in distributed databases for atomic transactions.
- Phase 1 (Prepare): Coordinator asks participants to prepare
- Phase 2 (Commit): If all agree, commit; else rollback
- What is sharding?
Sharding divides large datasets across multiple machines to improve scalability and performance. Each shard holds a subset of the data.
- What is a database schema vs an instance?
- Schema: Structure or blueprint of database (tables, relationships)
- Instance: Actual data stored at a given moment
- What is data warehousing and ETL?
- Data Warehouse: Central repository for analytical data
- ETL: Extract -> Transform -> Load process to integrate data from multiple sources for analysis